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Abstract 

When considering the problem of forecasting a continuous-time stochastic process over an entire 
time- interval in terms of its recent past, the notion of Autoregressive Hilbert space processes (arh) 
arises. This model can be seen as a generalization of the classical autoregressive processes to Hilbert 
space valued random variables. Its estimation presents several challenges that were addressed by 
many authors in recent years. 

In this paper, we propose an extension based on this model by introducing a conditioning 
process on the arh. In this way, we are aiming a double objective. First, the intrinsic linearity 
of arh is overwhelm. Second, we allow the introduction of exogenous covariates on this function- 
valued time series model. 

We begin defining a new kind of processes that we call Conditional arh. We then propose 
estimators for the infinite dimensional parameters associated to such processes. Using two classes 
of predictors defined within the arh framework, we extend these to our case. Consistency results 
are provided as well as a real data application related to electricity load forecasting. 
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1. Introduction 

We consider a function-valued process Z = (Z k , k G Z) where for each k, Z k is a random 
element taking his values in some functional space F. A popular choice is to set F = H a real 
separable Hilbert space because of the rich geometric properties of Hilbert spaces. As for classical 
time series, an important task is the problem of obtaining some information about the future value 
Z n+ i from the observed discrete sequence Z 1; . . . , Z n . Then, the best predictor (in the quadratic 
mean loss function sense) of the future observation Z n+ i is its conditional expectation given the 
past 

Z n +\ = ^(Z n+ i\Z n , . . . , Zi), (1) 

which may depend on the unknown distribution of Z. 

One important case arises when one assumes that Z is a strictly stationary zero-mean Autore- 
gressive Hilbertian process of order 1 arh(1), introduced by Bosq pQ and defined by 

Z k+1 = pZ k + e k , k G Z, (2) 

with p a bounded linear operator over H and e = (e k , k G Z) a strong iJ-valued white noise. For 
this process, the best predictor of Z n+X given the past observations is Z n+ i = pZ n . Notice that p is 
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usually unknown. Two forecasting strategies can be followed here. The first one is to first estimate 
p and then apply it to the last observation Z n to obtain a prediction of Z n+ \ (see Bosq PQ, Besse 
and Cardot [2], Pumo [3]). Alternatively, one may directly predict Z n+ \ by estimating the relevant 
elements of the range of p* (see Antoniadis and Sapatinas [1]). Adopting this last strategy and 
using some wavelet decomposition the later authors obtain considerable better prediction results. 
The choice of a wavelet basis is guided by the good approximation properties they have to represent 
quite irregular trajectories of Z. Kargin and Onatski j5] also use the second strategy but propose 
to use a data-dependent basis adapted to the prediction task. 

While arh processes are a natural generalization of the well known autoregressive processes in 
Euclidean spaces, the infinite dimension of the space H produces new challenges for their estimation 
and prediction (see Mas and Pumo [5] for a recent review on this topic). A second issue is the 
study of some of the extensions developed on the scalar case to the Hilbertian framework, like for 
instance higher order arh processes studied in Pumo [7j. 

We are interested in another extension taking into account some exogenous information modeled 
by the influence of covariates in the model given by equation (|2]). We may cite Mas and Pumo [H] 
that uses the derivative of Z^ as a covariate, or Damon and Guillas [9J that introduces a function- 
valued covariate also following an arh process. In both these works, the covariates are introduced 
as additive terms in the equation (|2]). 

Alternatively, one may introduce exogenous information through the linear operator p. Like 
in the scalar case, one may consider a more general case where the parameter p depends on 
some covariate. For such cases, the exogenous information may be incorporated in a non-additive 
manner. Guillas [10] propose to model Z by a doubly stochastic Hilbert process defined by 

Zk = pv k (Zk-i) + e*., k G Z, (3) 

where V = {Vk, k G Z) is a sequence of independent identically distributed Bernoulli variables. 
The intuition behind the model is that there exists two regimes expressed through two different 
operators, po and p\. At each instant k, one of the regimes is randomly chosen as the result of 
the drawn of the associate Bernoulli variable The resulting process admits to have one of the 
regimes to be explosive if it is not visited too often. In such a case, equation ^ has a unique 
stationary solution. 

In this paper, we introduce the Conditional Autoregressive Hilbertian process (carh), con- 
structed such that conditionally on an exogenous covariate V, the process Z follows an arh pro- 
cess. While carh definition is similar to equation (|3j) , it differs mainly in two ways. The first one 
is related to the nature of the process V which we assume to be a multivariate random process with 
some continuous distribution. Second, we propose predictors that will accomplish the prediction 
task using the exogenous information. Indeed, the exogenous information of the actual regime is 
used to found similar local situations on the observed past. 

The paper is structured as follows. In Section [2] we introduce the main definitions and we present 
the model. Linear operators on Hilbert spaces are intensively used through out the article. On 



Appendix A we recall some important facts on this topic that we use on the article. We also propose 
estimators for the unknown parameters as well as two classes of predictors. The main results about 
the convergence of the estimators and predictors are shown in Section [3] postponing the proofs 
until the |Appendix B| Finally, Section [4] contains a real data application of carh processes which 
illustrates empirically the performance of the predictors. 
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2. Conditional arh process: carh. 

After some notations, we define the carh process and we propose estimators for the associated 
parameters operators. Then, we follow prediction strategies similar to those adopted in previous 
studies for arh processes, to obtain classes of predictors for the carh process. 

2.1. Preliminaries 

All variables are defined on the same probability space (f2, J 7 , P). We consider a sequence 
Z = (Z k ,k G Z) of Hilbert space valued random variables, i.e. each random variable Z^ is a 
measurable map from the probability space in an real separable Hilbert space H endowed with its 
Borel a-field, %. The space H is equipped with the scalar product < ., . >h and the induced norm 
IUI-H")- We also consider a sequence of real IR d — valued random variables V = (V k ,k G Z). Both 
sequences Z and V are assumed to be stationary. We will focus on the behaviour of Z conditionally 
on V. We will further assume that Z is strongly integrable. 

The conditional expectation is characterized by the conditional distribution of Z given V, i.e. 
by the conditional probability P^|v on "H. In order to ensure that this conditional probability is 
properly defined as a measure (in the sense that it represents a regular version of the conditional 
probability), it is assumed that a transition probability exists that associates to each v G M. d a 
probability measure F v on (if, %) such that 

f v zlv (A) = F V (A), for every AeH,vE R d . 

We call F v the sampling measure and denote the induced expectation. We restrict our attention 
to functions defined over a real compact interval T and we assume hereafter T to be [0, 1] without 
loss of generality. More precisely, we set H to be the subspace of continuous functions on the space 
of classes of 4-th order P —integrable functions. 

2.2. The model 

A sequence {Z,V) = {(Zfc,\4), k G Z} of H x Revalued random variables is a Conditional 
Autoregressive Hilbertian process (carh) of order 1 if it is stationary and such that, for each k 

Z k = a + py h {Z k _ x - a) + e k , (4) 

where the conditional mean function a v = K v [Zq\V],v G IR d , is the conditional expectation (on V) 
of the process, e = (e k , k G Z) is an if— valued white noise and (pv k , k G Z) is a sequence of random 



operators such that, conditionally on V, py is a linear compact operator on H (see Appendix A). 



Additionally, V and e are independent process. Using the following assumptions we prove the 



existence and uniqueness of the carh processes (see Appendix B for the proof). 
Assumptions 2.1. Assume that: 

1. There exists a map ui-yP" that assigns a probability measure on (H,T-L) to each value v in 
the support of V. 

2. sup„ \\pvjc = M p < 1 . 

Theorem 2.2. Under Assumptions 2.1, equation Q defines a carh process with an unique sta- 
tionary solution given by 

oo /j-i \ 
Z k = a + I II PV"-p 

with the convention Y[ P =oPv k ^ p — Id (the identity operator) for j = 0. 



3 



The first condition in Assumption 2.1 has already been discussed. The second one ensures 



the contraction of the conditional autoregressive operator through the supremum norm of linear 



operators (see Appendix A) 



2.3. Associated operators 

Hereafter we make the additional assumption that E u [||Z||!f-|y] < oo. Let us note H* the 
topological dual of H, i.e. the space of bounded linear functionals on H. We introduce two 
linear operators mapping from H* to H associated to the carh process. Thanks to the Riesz 
representation, H* the topological dual of H can be identified with H, and the operators may be 
defined as follows: 

zeH^T v z = E V [((Z - a) <g> (Z - a))(z)\V] and 
zeH^A v z = E V [((Z -a)<g> {Z x - a))(z)\V], 

that we call conditional (onV) covariance and cross covariance operators respectively. We have 
used the tensor product notation (u <g> v)(z) =< u, z >h v for u,v , z G H. 

For each v G M d , both T v and A„ are trace-class and hence Hilbert-Schmidt. In addition, T v is 
positive definite and self adjoint. Then, we may write down the spectral decomposition of T v as 

Td = ^ A„ j(&vj ® Cv,h) 

jen 

where (Kj,^v,j)jeN ar e the eigen-elements of r„. The eigenvalues may be arranged to form a 
non-negative decreasing sequence of numbers tending towards zero. 

As a direct consequence of the choice made for H, the operators have associated kernels j v and 
6 V defined over L 2 ([0, l] 2 ) such that 

T v (z)(t) = / i v (s,t)z(s)ds, 
Jo 

A v (z)(t) = / S v (s,t)z{s)ds, t G [0, l},v G R d ,z G H, 
Jo 

with 7„(., .) a continuous, symmetric and positive kernel and S v (.,.) a continuous kernel. The 
kernels turn to be the conditional covariance function j v (s,t) = K v [(Z (s) — a(s))(Z (t) — a(t))\V], 
and the one-step-ahead conditional cross covariance function S v (s,t) = K v [(Z (s) — a(s))(Zi(t) — 
a(t))\V), (s,t) G [0,l] 2 ,^GM d . 

A Yule- Walker like relation links the operators A^,r„ and p v . For each v G M. d we have 

A„ = p v T v . (5) 

Using the property of the adjoint and the symmetry of r„, we obtain from ^ the following key 
relation for the estimation of p v (see Section 2.5), 

a: = r vP * v . (6) 

2-4- Estimation of a, T v , A v . 

The parameters can be estimated from data. We call {(Z\, Vi), . . . , (Z n , V n )} the observed data 
supposed to come from a carh process. We use nonparametric Nadaraya- Watson like estimators 
to estimate the infinite-dimensional parameters a v , T v and A„. This is a popular choice when the 
the parameters are defined through conditional expectations. 
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2.4-1- Estimation of a v . 

We estimate the conditional mean function of the process a v (t) = E, v [Zo(t)\V] for all t e [0, 1] 
using the observations {(Z 1 , V\), . . . , (Z n , V n )}. In order to properly define the framework, let us 
introduce some quantities. For some fixed t e [0, 1], set Y = Z (t) and Y^ = Zi(t),i = 1, . . . ,n. 
Let us assume that the distribution of V admits a density / with respect to the Lebesgue measure. 
We define for v e M d 

gv{t)=W[Z (t)f(V)\V\, 
and provided that f(v) > we rewrite the parameter as the regression of Y against V, 

a v (t)=E v [Y\V] 
= 9v{t) 
" /(«)' 

When f(v) = we set a v (t) = E[Y}. 

The introduced quantities can be estimated by Nadaraya-Watson kernel based estimators. In 
our case, we use the following estimators for / and g v respectively, 



1 n 

= ^E*w - «)) and ( 7 ) 

a i=i 

i n 

9v,n(t) -^ d Y, K ^a\V i -v))Y h (8) 



i=l 

D<2 



where : R n- K is a unitary square-integrable d— dimensional kernel and the bandwidth h a = 
(/i^njngN is a decreasing sequence of positive numbers tending to called the bandwidth. The 
estimator of a v (t) is then given by 

9v,n(t) 



j S Q>v,n / j\— \ l^ri, 

with weights given by 



fn{v) 

which can be written as a„ 5 „ = Y17=i w nA v i h a )Yi which is a weighted mean of the observed values 



, Kih^iVi-v)) 
w n4 (v, h) = K ^ h _^ Vi _ y ^ ■ (9) 

2-4-2. Estimation ofT v . 

For the estimation of r„ we proceed in an analogous way. Without loss of generality, we assume 
that Z is centered. First, for (s, t) G [0, l] 2 fixed, consider the real valued variables Y = Z (s)Z (t) 
and the observations Yj = Zi(s)Zi(t) with % = 1, . . . ,n. Now redefine the auxiliary quantity g v 
using the new definition of Y and Y iy i = 1, . . . ,n. Set g v (t) = E v [Z (s)Z Q (t)\V] and write the 
parameter again as the regression of Y against V. Then, with a similar reasoning it follows that 
the estimator of the kernel of 7„ at (s, t) is 

n 
i=l 

with weights given by ([9]). Moreover, on the general case of a not necessarily centered process the 
estimator of r„ can be written as 

n 

T v ,n = ^2w nt i(v,h y )(Zi -a ViTl ) <g> [Zi -a v , n ). (10) 
1=1 
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2.4-3. Estimation of the conditional cross covariance operator A v . 

Again, the estimation of the operator is done through the estimation of its kernel, which is in 
this case the conditional cross covariance function 5 V . We work first with the centered process. 
Fix (s,t) G [0, l] 2 and again redefine Y = Z (s)Zi(t) and the observations Yi = Z i ^\{s)Z i {t) for 
i = 2, . . . ,n. Define / and g v and their estimators of the same form as ([7]) and ^ respectively 
using the bandwidth hs and the new variables Y and Yi. The resulting estimator of 5 v (s,t) is 



S v ,n(s,t) = y^ j w n ,i(v,hs)Zi-i(s)Z i (t). 



=2 



We can now plug-in the estimated kernel on the operator which yields the estimator of A„. We 
write it for the general case of a non centered process as, 

n 

A n ,„ = y^ j w n4 (v,h s ){Z i ^ 1 - a n {v)) g> (Zi - a n (v)), 

i=2 

where the weights are given by equation (J9]). 
Remark 

If the denominator on equation ^ defining the weights is equal to zero, i.e. f n (v ) = 0, then 
one usually sets the weights to w nt i(v, h a ) = n. -1 or w n ,i{v) — for all i — 1, . . . ,n in order to 
define the estimator for all v G M d . The weights are more important for those segments Zi with 
closer value of Vi to the target v. The bandwidth plays a key role, tuning the proximity of the 
scatter of M, d to v via the scaling of the kernel function. Large values of h lead to weights w n< i 
that are not negligible for an important number of observations. Conversely, small values result 
in only few observations having a significant impact on the estimator. This produces the common 
trade-off between bias and variance of kernel regression estimators. 

2.5. Estimation of p v . 

The intrinsic infinite dimension of the space makes difficult the estimation of the operator p v . If 
H is finite-dimensional, the equation ^ provides a natural way of estimating p v . One may plug- in 
the empirical counterparts of the covariance operators and solve the equation in p v . However, 
when H has infinite dimension, T v is not invertible anymore. To well identify p v from ^ the 
eigenvalues of r„ need to be strictly positive. An analogous assumption is to ask the kernel of T v 
to be null (see Mas and Pumo [B]). In this case, a linear measurable mapping T" 1 can be defined 
as T,; 1 = Y.jen Kj( e v,j ® e V:j ) with domain 

V r -i = 

that is a dense subset of H. It turns to be an unbounded operator and in consequence continuous 
nowhere. Hence, there is no hope to obtain any theoretical asymptotic result. However, from ^ 
we obtain that 

o° = A r -1 

where pi is the conditional autoregression operator p v restricted to V T -i as a consequence of 
r^T" 1 = Id _ 1 . On the other hand, since the adjoint of a linear operator in H with a dense 
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domain is closed (closed graph theorem, see for example Kato [HI Theorem 5.20]) and since the 
range of the adjoint of the cross- covariance operator, A*, is included in D r -i we can deduce from 
Q that over T) r -i, 

p* = T~ 1 A*. 

rv v v 



As pointed out by Mas [12] one can use classical results on linear operators to extend by continuity 
the definition of p* to H , in order to obtain 



Pv = Ext(pt) = (K'KT = (A,r; 



Therefore one may focus on the estimation of p* because of the theoretical properties are applicable 
to p v through the composition of p* by the adjoint operator. 

We can now propose two classes of estimators for p* (see Mas [12] for analogy with the estimators 
on the arh setting). The first one, the class of projection estimators, projects the function space 
valued observations on an appropriate subspace H v ^ n of finite dimension k n = k VjTl . Let 11,,^ be the 
projector operator over H v ^ n . Then one inverts the linear operator defined by the random matrix 
n i)i fc n r„ )n n„ ) fc n and completes with the null operator on the orthogonal subspace. For example, the 
space H v ^ n may be set equal to the one generated by the first k n eigenfunctions of T v . Then, the 
subspace H v ^ n is estimated by H v ^ n , the linear span of the first k n empirical eigenfunctions. By 
this way, if P„,& n is the projection operator on H v ^ n , the estimator of p* can be written as 

Pv,n,k n = (Pv,krX v^nPv.kn) / ^-v,n^ D t},k„- (H) 

The estimation solution by projection over a finite dimensional space is equivalent to approximate 
T^ 1 by a linear operator with additional regularity rj, defined as 

3=1 

where (k n ) n is an increasing sequence of integers tending to infinity and b is some smooth function 
converging point-wise to x y-¥ 1/x. Indeed, Tj, — > T^ 1 when k n — > oo. The choice of taking 
b(x) = 1/x yields, for a finite k n , to set Y\ equal to a spectral cut of However, this choice is 
not unique. Mas [12] considers a family of functions b p ^ a : IR + i— > IR + with p G N such that 

bp,a\%) 



(x + a n y+ v 



with a n a strictly positive sequence that tends to as n — > +oo. With this, the second class of 
estimators for p*, the resolvent class, is defined as 

Pv,n,p,a = bp t a{Fv,n) ^v,ni (12) 

where we write b Pj0l (T Vtn ) = (T v>n + a n /) _ ^ p+1 ^ with p > 0, a n > 0, n > 0. Then, the operator 
b p ,a(Xv,n) can be associated to a regularized approximation of T" 1 (see Antoniadis and Sapatinas 
[1] for a discussion on this topic applied to the arh estimation). 

Finally, both classes of estimators allow one to predict the future value Z n+ i from the observa- 
tions by first estimating the autocorrelation operator p* and then applying it to the last available 
observation Z n . 
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3. Main results 



In this section we announce the main theoretical results that justify the choices made on the 
estimators presented in the previous section. 

Neither (Vk,k G Z) nor (Zk,k G Z) are assumed to have independent components. We deal 
with their dependence through a strong mixing hypothesis, that is, we assume each sequence to be 
asymptotically independent by controlling the decay of the dependence. Many contexts of mixing 
exist in the literature. In general one relies upon a measure of the decay of a dependence of two 
observations as a function of their time gap. We use the 2-a-mixing setting, a slightly weaker 
setting than the a-mixing one (see Bosq and Blanke [13]). Let X = (A% k G Z) be a stationary 
random process and consider the a— algebras a(X ) and (t{X^) and the 2-a-mixing coefficients are 
defined as 



a%\k) 



sup \P{BnC)-P{B)P{C)\. 

Bt=a(X );Cea(X h ) 



(2) 

When linifc-s.oo a x (k) = we say that X is 2-a-mixing. If the mixing coefficients have a geometrical 
decay, then the corresponding process is called geometrically mixing (GSM). 

3.1. Convergence of the mean function estimator a v . 

We first prove the pointwise convergence, i.e. for a fixed t G [0, 1], using the additional Assump- 



tions 3.1 A uniform convergence is obtained by assuming the last two conditions of Assumptions 



3.1 to hold uniformly on [0, 1]. See Appendix B for proofs together with explicit constants (de- 
pending on v) for the convergence rate. 

Assumptions 3.1. Assume that: 

i. V admits a probability density function / and for each s ^ t, (V s , Vt) has a density fv s ,v t such 

that sup |s _ t|>1 HG^tHoo < oo where G s>t = fv s ,v t - f <8> f- 

ii. Both (Zk, k G Z) and (V&, k G Z) are strong mixing processes with geometrically decaying 

coefficients a^(k) = /3 e~ (3lk for some (3 ,/31 > and k > 1. 

iii. H^fclli? = M z < oo, \/k. 

iv. The kernel A' is a bounded symmetric density satisfying 

~ K(v) 



13 



0. 



1. \im v 

2 - fm.d\\v\\l L dK(v)dv < oo, 

3- Ld \vi\\vj\K(v)dv < oo for i,j 



1 d. 



v. The maps v i— >■ f{y) and v (-)■ g v (t),t G [0, 1] belongs to Cj(b) the space of twice continuously 
differentiable functions z defined on IR d and such that 



d 2 z 



dvidvn 



< b. 



vi. ¥,"[Zl{t)\V]f{v),t G [0, 1] is strictly positive, continuous and bounded at v G 

2. 1 and 3. 1 , for a bandwidth verifying h a ^ n - 



Proposition 3.2. Under Assumptions 
c n — > c > 0, when n — > oo, we have 



„ ( In n\ 



8 



1. 

f n (v)- f(v) = O [{^y^ I a.*., (13) 

2. 

In 77, \ 4+d 

- a«(*) = O ( ( — J j a.s. 



Let us comment the assumptions for this result. The density condition 3.1[ i) may be droped 
if one uses a more general framework like in Dabo-Niang and Rhomari [14] where no density 
assumption is done and the observations are independent. However, similar results for dependent 
data are not available yet. The hypothesis concerning the decay of the mixing coefficients allows 
us to control the variance of the estimators. We impose some weak conditions on the kernel K 
that are usual in nonparametric estimation. All symmetric kernels defined over a compact support 
verify the hypothesis, but also more general ones like the Gaussian kernel. Conditions v and vi 
are used to control the bias terms of the estimators that is purely analytical. 



The convergence rates obtained in Proposition |3.2| are the usual ones. They rapidly degrade 
with the raise of the dimension of ¥L d , the space where V lives, as the consequence of the curse of 
dimensionality. In one hand, the first result is well know on the estimation of a multidimensional 
density functions, even for dependent data. We include it for sake of comprehension. Note that 
only the observations of Vi, . . . ,V n are used to estimate f(v) the density of V at v G M. d . This 
result is true for each t G [0, 1]. On the other hand, the consistency of a VjTl (t) is only valid for some 
fixed value t G [0, 1]. However, we can obtain a version of this result that holds true uniformly on 
[0, 1] (conditionally on V). 



Proposition 3.3. Under Assumptions 2.1, S.l(i-iv) and if 3.1\ (v-vi) hold true for all t G [0,1], 



a bandwidth verifying h a ^ n = c n (^Y^ +4 , c n — > c > 0, when n — » oo, yields 



2 



\\a n (v,.)-a(v,.)\\ H = 0\l—) j 

3.2. Convergence ofT v ^ n and A v>n . 

Similarly to the convergence of the conditional mean function, we first prove the pointwise 
convergence of j v ,n( s it) an d o~ v ,n{s,t), and then extend the result to the uniform convergence of 
these kernels over [0, l] 2 . Then, the consistency of the operators follows. In addition, we obtain 
the consistency for the estimators of the spectral elements of r„. 



Proposition 3.4. Under Assumptions 2.1 and 3.1, and if "&^Z^j\V\ < oo then for a bandwidth 
verifying h 7tU = c n l ^ d+i \ c n — >■ c > 0, when n — >■ oo, we have 



2 

'lnn\ i+d 



Again, the result is valid uniformly for (t, s) G [0, l] 2 . Through the equivalence between Hilbert- 
Schmidt norm and the integral operator norm (on L2QO, I] 2 )) one has, 

\\F v ,n ~ r,,]]^ =|l7u,n(-> •) — 7u(-> •) II L 2 ([0,1] 2 ) 
1 rl 

2, 



^0 



(%n(s,t) - 7„(s,i)) dsdt 
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and thus the strong consistency of T v>n follows. 



Proposition 3.5. Under Assumptions 2.1, S.l(i-iv) and if 3.1^ v-vi) hold true for all t E [0,1], 



and M[\\Z\\jj\V] < oo, then a bandwidth verifying h 7>n = c n 1 ^ d+4 \ with c n — > c > 0, when 
n — > oo, yields 

l|r«.n - r„||)c 2 = o ( ( — ] ) a.s.. 




Now, one may use the consistency properties of the empirical eigenvalues \ v ,j,n as estimators of 
the true ones \ v ,j,j > 1, obtained by Bosq [U] in the dependent case. Also a result concerning the 
convergence of the empirical conditional eigenfunctions e v ,j,n is provided. See Mas and Menneteau 
[T6] for a general transfer approach of limit theorem properties and modes of convergence from the 
estimator of a covariance operator to the estimators of its eigenvalues. 



Corollary 3.6. Under the conditions of Proposition \375\ we have 
1. 

SUp \X v ,j,n - = O ( ( ) ) a.s. 

2. 




K.i.n ~ CvAh = ZvjO I ( — ) I as- 



where^ vjn =< e V;j>n ,e v>j ) > H e v>j>n and£ v>1 = 2\/2/(A„,i— A„ )2 ), £ V j = 2V2/ min(A„ J _i— X v j, — 
K,j-i) for j > 2. 

Note that the conditional eigenfunctions are estimated up to their sign. This causes problems 
both in practice and in theory. The estimated object is the eigen-space generated by the associated 
eigenfunction and not its direction. 

Finally, using similar arguments we obtain the convergence of the conditional cross-covariance 
operator. 



Proposition 3.7. Under Assumptions 2.1, S.l(i-iv) and if 3.1\ (v-vi) hold true for all t E [0,1], 



and ~E[\\Z\\jj\V] < oo, then for a bandwidth verifying h n = c n (iiii!) 1 / ( ' d+4 - ) w ith Cn _ >. c > o, when 
n — > oo, yields 

2 n 

In n \ 4+d 

\A V „ - A V \\ K2 =0 I I 1 a.s.. 



n 

3. 3. Convergence of the predictors 

The two proposed classes of estimators for p* can be use to predict Z n+ i by applying them to 

the last observed function Z n . However, since Z n was used on the construction of the estimator 

and the process has a memory length of 1, a better approach is to study the prediction error on 

the next element of the sequence. We introduce a final set of assumptions needed to shown the 

p 

convergence in probability that we denote — >. 
Assumptions 3.8. 

1. E[||Z||^|V] < oo. 
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2. is one-to-one. 

3. P(liminf E n ) = 1, where S n = {u G Q : dim(Rg(P* n f v>n P* n )) = k n }, with Rg(A) denoting 
the range of the operator A. 

4. n\l n {v) ->- oo and (1/n) ik(v)/\ 2 k {v) ->■ 0, as n -> oo. 

A strong finite fourth conditional moment of Z was used for the definition of T v and A„. The 



second condition in 3.8 is necessary to uniquely define the conditional autoregression operator 
p v . The third one is necessary to guarantee that the random operator P^ n T v ^ n P^ n is almost sure 
invertible. Controlling the decay of the eigenvalues of the conditional covariance operator is used 



for the consistency of the projection class operator (see Corollary 3.6 for the definition of £). 



Alternatively, one may set A v (k) = Xk(v) where A„ : K — > K is a convex function (see Mas [TT]). 
Theorem 3.9. If Assumptions \2~l\ \sl\ hold true Vt G [0, 1] and[^g[ if X k (v) = c c^,c > 0, c x G 



(0, 1) and if k n = o(lnn) as n — > oo ; then 



Theorem 3.10. If Assumptions 2.1, 3.1 hold truest G [0, 1] and S.S(i-ii), and ifb n -> 0, V^^n -> 



oo for some p > as n — > oo, then 



\P*v,n,p,a( Z n+l) ~ P*(Z n+1 )\\ H — > 



4. Empirical study 



We apply the carh process model to predict the electricity daily load curve for the french 
producer EDF (Electricite de France). Our aim is to introduce the temperature information as 
an exogenous covariate on a functional prediction model using carh processes. The electricity 
demand is highly sensitive to meteorological conditions. In particular, changes in temperature 
during winter have a high impact on the French national demand. This relationship is not linear 
and depends on the hour of the day, the day of the week and the month of the cold season. 
Moreover, it is unknown in which way the temperature should be coded in order to extract the 
relevant information for a prediction model. More details on this dataset are given in Antoniadis 
et al. US]. 

We compare in terms of prediction error, the AutoRegressive Hilbertian model (ARH) and the 
Conditional AutoRegressive model (CARH). The data we use are the electricity load for the first 
three months of 2009 (where the load is very sensitive to temperature changes) recorded at a 30 
minutes resolution and an estimate of the national temperature computed by EDF recorded each 
hour. The function-valued process Z is the sequence of daily loads of the national grid. As the 
calendar has a very important effect on the electricity demand, we work only with one day-type, 
namely the weekdays from Mondays to Friday excluding holidays. The covariate V is constructed 
as an univariate summary of the daily temperature profile. Concretely, we compute the variation 
coefficient of the temperature records for each day. The total number of observations is 41, where 
we use the first 33 (approximately 80%) for calibration of the model and the last 8 to measure the 
prediction quality of the calibrated model. 



For both models we use projection type estimators (see Equation (11)). Using the calibration 
dataset we estimate the parameter k n , that is the dimension of the projection space for both 
models. In addition, we estimate the bandwidth parameters for the CARH model. The results of 
the parameters' estimation is summarised in Table [TJ We also compute the in-sample estimation 
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ARH 


CARH 


Dimension (k n ) 


2 


5 


Estimation error 


1616 


929 


Prediction error 


1522 


1265 



Table 1: Values of the estimated parameters for both ARH and CARH prediction models. The 
estimated set of bandwidths is h a = 1.21 x 1CT 1 , /i 7 = xlO -4 , hg = 3.95 x 1CT 1 . 



error as the prediction error obtained using the set of parameters that minimise the root mean 
square error (RMSE) on the training dataset. The CARH model seems to obtain a better fit on 
the calibration set since it presents a smaller estimation error. 




Figure 1: Three days of the electricty demand (solid gray) with the one-day ahead predictions 
using ARH (dashed) and CARH (solid black) models. 

In order to estimate the prediction error we use the test dataset. We compute the error as 
the RMSE. Again the CARH model presents a smaller error than the ARH model. On Figure [T] 
we present three days of the electricity demand as well as their predictions using the ARH and 
CARH models. The effect of the covariate seems to be expressed locally in some parts of the day. 
Effectively, it corresponds to the daytime demand which seems to be reasonable because the effect 
of the temperature on the electricity demand is higher during day hours than night hours. 

Appendix A. Linear operators in Hilbert spaces 

We recall here some relevant facts about linear operators on Hilbert space (see Kato [TU Chap. 
5] for details). 

We note H* the topological dual of H, i.e. the space of bounded linear functionals on H. 
Thanks to the Riesz representation H* can be identified with H. We note £ the space of bounded 
linear operators from H to H equipped with the uniform norm 

\\p\\c= sup \\p(z)\\ H , pe£,zeH. 
\\4b<i 

This space seems to be a too large space, so one usually consider the subspace of compact operators 
K, that is easier to deal with (see Mas [H]). For instance, if the operator p is compact then it 
admits a unique spectral decomposition, i.e. for two bases (<j)j)jen and (V^OjeN an d a sequence of 
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numbers (A_j)jgN that we can choose to be non-negative (choosing the sign of ipj) we have 



where we use the tensor product notation (u®v)(z) =< u, x >h v for any elements z,u,v G H. We 
say that a operator p is self-adjoint if < pu,v >h=< u, pv >h for all u,v G H. If p is symmetric 
the decomposition becomes p = J2j eN <8> 4>j with eigen-elements (Xj,(f>j)j e ^. If p is not self- 
adjoint, we call p* its adjoint. Finally we say that p is positive-definite if it satisfies < pz, z >#> 
for all z G H . Two subspaces of K, will be of our interest: the space of Hilbert-Schmidt operators 
/C2 and the space of trace class (or nuclear) operators K\ defined respectively as 



/C 2 = {AG/C:^A J 2 <oo}, 



£1 = {A G K : ^|Aj| < 00}. 



The Hilbert-Schmidt operators form a separable Hilbert space with inner product < p, r >ac 2 = 
SjeN < W'i' T^j > w hh (V'i)j an orthonormal basis and p,r e JC 2 (the product does not depends 
on the choice of the basis, see Kato [Tlj p. 262]). The associated norm yields from ||p||^ 2 = 
X^eNll/^iillf = SjeN^j- On the other hand the space of trace-class operator endowed with the 
norm \\.\\ki defined as \\p\\ki — J2j |Aj — is a separable Banach space. Finally, from the continuity 
of the inclusions /Ci C /C2 C /C C £ we have that 



> 



|/c 2 



> 



Appendix B. Sketch of proofs. 



Proof of Theorem \2.2\ 

We mimic the proof of Theorem 1 in Guillas [10]. To prove the existence, Let 



E 



m' / j-1 \ 
j=m \p=0 / 



j=m 




where we used the independence between V and e gives 

\ \p=0 / \p=0 / 



-J' 



for j ^ f. 
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Finally, we obtain 



3-1 

n 

.P=0 



Pv n - P I 



< aM 2 J. 



We have that the upper bound is the general term of a convergent series. For m, m' tending 
to infinity, rj™ tend to zero and the Cauchy criterion gives the mean square convergence of the 
solution. 

Now, consider the stationary process W n = a + Y^jLa {Yli=o Pv„- p 
surely boundedness of pv n we have that it is indeed a solution of the carh process 



e n _j). From the almost 



(W n - a)-p Vn (W n -i - a) -- 

oo /j-l 

3=0 \p=0 
oo /j-l 

=e n^- 

3=0 \p=0 
oo /j—l 

=e rK 

j=0 \p=0 

= e n . 



-n-3 



^2 PV " il PVn-l- P e 



n-X-j 



3=0 

oo 



\P=0 



J 



3=0 \p=0 
oo /j'-l 

j'=l \p=0 



Cn-1- 



-n-f 



Proof of Proposition 3J} 

The proof is based on the classical decomposition in terms of bias and variance of the estimators. 
The bias term is purely analytical. The variance term is composed by the variance and covariance 
of the estimator's terms. The dependency of the data is controlled by means of the following 
exponential inequality (a proof can be founded in Bosq and Blanke [131 P- 140]). 

Lemma Appendix B.l. Let W = (W t ) be a zero-mean real valued stationary process with 

su Pl<t<n 

llWtUoo = M < oo, (M > 0). Then for qe [l,n/2], k > 0, e > 0, p = n/(2q), 



P 



E w '. 



i=l 



> ne ) < (1 + n)ax 



n 
2q 



+ 



4exp 



n 2 e 2 /q 



8(1 + K)a{q) + + K ) n 2 g -2 e 



(B.l) 



with o~(q) an intricate quantity involving the pairwise covariances ofW. We will only need a bound 
of a(q) that in the stationary case turns out to be 



[p]+i 



a(q) < ([p] + 2)(Var(Wo) + 2^ |Cov(W ,W,)|). 



(B.2) 



i=i 



Proof of 1 . One has 



Ef n {v)-f(v)= / K(u)(f(v-h n u)-f(v))du 
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Using Taylor formula and the symmetry of K one gets 



E/»-/(,)4/ Rj A»(E^^ 



v — 9h n u) 1 du 



where < 6 < 1. Finally, Lebesgue dominated convergence theorem gives 



h~ 2 \Ef n {v) -f(v)\ -^b 2 (v) = 1/2 \J£ ^l-( v ) J^u iUj K(u)dv^ (B.3) 



We use ( B.l[ ) to deal with the variance term f n (v) — Ef n (v). Define Wi = K^iv — Vj) — EKh(v — 
Vi), with K h {.) = K(./h). Then, M = 2h~ d \\K\\ 00 . Let us choose q n = 2 o " nra for some p > 0. 
Which yields on a logarithmic order for p n = po l nn - This choices and the boundeness of / and G St t 



entail on B.2 



a{q n ) < (Pn + 2) V&r{K h {v - V x )) + (p n + 2) 2 sup ||G s ,t||oc 

|s-i|>l 

<PnK d \\K\\lf(v)(l + 0(l)). 



Now take e = 77W ^P 1 , for some > 0, then 



P n 



i=l 



> V\l —nr I < (1 + «)||a||oo- 



(Inn) 4 + d 



+ 4exp 



ry 2 Inn 



4(l + «) a ||A-||3/(T;)(H-o(l)) 



If we take 77 > 2(1 + k)\\K\\ 2 ^/ f(v) and > 2/3, then where both terms are o(n A ), for some 
A > 0, in which case 



5> 



n \ i+d 



Inn 



i=l 



> Vc n d/2 > < 00. 



So Borel-Cantelli lemma implies limsup n _> +00 (w) 4+d \f n (v)-E f n (v)\ < 2cn d/2 (l+K)\\K\\ 2 ^/f{ 
almost surely for all k > 0. We have finally 



lim sup 



n \ i+d 



Inn 



|/» - /(«)! < 2C^/ 2 ||K|| 2V / 7M+ C 2 |&2(^)| ; 



which gives (13). 



Proof of 2. We use the following decomposition, omitting the argument v, 



a„ — a 



9v,n Q>fn 
fn 



From ( 13 ) we have for the denominator that f n — > f(x) almost surely. We work out the numera- 



tor through the following decomposition between variance and bias terms. Let ip n —(n/ In n) 2 ^ i+d \ 
then one has 

ipn\9v,n -af n \ < ip n \g v , n - af n - E(g v , n - af n ) \+ip n \ E(g v , n -af n )\. 



-v — 

— A n 



— v — 

■=B n 
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We first study A n using as before the exponential type inequality (B.l) with the redefined 
random variables 

Wi = K h (v - Vi)(Yi -a v )-E (K h (v - V^Y - a v )) 

with the precedent choices of q n and p n . First, one has \W{\ < 2h~ d \\K\\ OQ (l + o(l)). Next, using 
Bochner lemma (Bosq and Blanke [13, p. 135]) we obtain 

K Var(Wi) < K d E[K 2 h (v - Vj&i - a v f] f(v)\\K\\tZ(v) 

where E(i>) = (E 1 ' [y o 2 1 V^] — a v ) is the conditional variance parameter. The logarithmic order of p n 
and the control on F gives cr 2 (q n ) < p n h~ d f {v)2T i {y)\\K\\ 2 .{\ + o(l)). As before, taking p > 2/(3i 
and for a large enough 77, Borel-Cantelli lemma entails 



lim sup A n < 2c- d/2 y/E(v)f(v) a. s. 

n—¥oo 

For the bias term we write 

E(&,» - a(u)/») = h- d [ K hn (v - t)(g(t) - f{t)a{v))dt. 



Then, we use the Taylor formula to expand g(t) — f(t)a(v) and Assumptions 3.1 iii-iv) to obtain 

d 



ijj n \B n \ ->■ b a (v) 



1 



E 



't>) — a(f)— — - — {v)\ I UiUjK(u)du 



dvidvj 



■ ■ i v — » — J 

Finally, putting all the elements together one obtains 



dvidvj 



limsup ( ; ) \a n [v) — a[v) 



from with the result is derived. 



< 2c~ d/2 \\K\\ 2 ^ f{v)Y,{v) + c' 



\Uv)\ 
f(v) 



(B.4) 



Proof of Proposition 3. 3 



The only terms on equation B.4 that depends on the value fixed for t are the conditional variance 
parameter E and the bias b a . With the new hypothesis holding uniformly, for each v G R , E(u,t) 
and b a (v,t) are bounded uniformly on [0, 1]. Then, recalling that 



|a n (v, .) -o(v,.)IIh 



(a„(t> , t) — a(v, t)) 2 dt, 



we obtain the derived result. 



Proof of Proposition 3^4 



The proof follows the same lines that those used to show Proposition |3.2[ 2). In particular, 

9v,n f fn 



T n - r 



fn 



gives the decomposition between variance and bias terms, 

i> n \9v,n -rf n \< ^n|?«,n - rf n - E(g v , n - rf n )\ + ip n \ ~E(g v , n - rf n )\ . 



:—B n 
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Which yields on 



lim sup A n < 2c~ d/2 y/H(v)f(v) a.s. 



where, by the redefinition of Y, = E v [(Z (s)Z (t)) 2 \V] - r(v, s,t). 

Again using Taylor formula to expand g(t) — f(t)r(v) and the precedent Assumptions we obtain 

d 



ifj n \B n \ -)■ b r (v) 



1 



d 2 g d 2 f 



dvidvj 



Finally, resembling the terms we get the equivalent of Equation (B.4) with the redefined £ and 



the bias b r , from with the result is derived. 



Proof of Proposition \3.5 . 

First, consider the following decomposition 

r\n = R n (v) - a n (v) <g> a n (v) - a n (v) <g> a n (v) + a n (v) ® a n (u), 

where R n {v) = Yli=i w n,i( v i h-^Zi ® Zi is the empirical counterpart of the second order moment 
operator R(v) = E V [Z ® Z \V], and a n (v) = YH=i w n,i( v i h 7 )Zi. Second, we obtain that 

r„ - r„ )Tl = -RO) - -R„(w) - a{y) <8> a(t>) + a„(t>) <g> a n (V) + a n (u) <g) a n (u) -a n {v) ®a n {v). 

Hence, we can control the estimation error regrouping the terms of the above decomposition (we 
drop the argument v), 



|r„-r 



v,n || K.2 



\R - RuWk.2 + ||o n <S>a n -a<S> a\\ K2 + \\a n <S> (a n -a n )\\ K2 . 



(B.5) 



From Propositions 3^ and 3^3 it follows that 

\\R — R-n \\k-2 — O 



n \ 3+d 



Inn 



a.s. 



The second term of the left hand side of equation (B.5) is equal to 

|| o„ ® (a n - a) + (a n — a) <8> a\\jc 2 < \\a n \\ H \\an - a||jc 2 + \\a r , 



a \\Ko \\ a \\H- 



Since both ||o||h an d H^nHi? are bounded and using Proposition 3.3 successively for a n and a n with 



their respective sequences of bandwidths /i 7j „ and h a n , we obtain that 

n 



\a n ®a n - a® a be, = O 



Inn 



2 

4+d 



a.s. 



With a similar reasoning, the same kind of result is obtained for the third term in (B.5). Putting 
the result for the three terms together conclude the proof. 



Proof of Corollary 3. 6 



First item is a direct consequence of the following property on eigenvalues of compact linear 
operators Bosq [H3 p. 104], 



sup \\j{v) - Xj, n (v)\ < \\r v - r 



v,n \\Ci 



and the asymptotic result obtained for ||r„ — r„ )n ||x^. 

For the second item, Bosq (2000, Lemma 4.3) shows that, for each j '• > 1, 

\\ e i( v ) - e 'j,n( v )\\H < Oll r « - r^lU- 



Again, the rates of convergence follows from Proposition 3.5 
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Proof of Proposition 3. 7 . 

The proof follows the same guidelines that those of Proposition 
by Rt(v) = J2i=i w nA v ' h 7 )Zi(s)Z i+1 (t) and Ri(v) 



3.5 



replacing R(v) and R(v) 
E v [Z (s)Zi(t)\V] respectively. Then, a 



decomposition like B.5 and the same kind of observations done for that proof entails the result. 



Proof of Theorem 3.9 



The proof follows along the same lines of Proposition 4.6 in Bosq pQ by using Propositions 3.3 



3.5, 3.7 and Corollary 3.6 



Proof of Theorem 3.1C\ 

The proof follows along the same lines of Proposition 3 in [121 Chapter 3] by using Propositions 



3.3 3.5 and 3.7 



References 

[1] D. Bosq, Modelization, nonparametric estimation and prediction for continuous time pro- 
cesses, in: G. Roussas (Ed.), Nonparametric functional estimation and related topics, NATO 
ASI Series, 1991, pp. 509-529. 

[2] P. Besse, H. Cardot, Approximation spline de la prevision d'un processus fonctionnel au- 
toregressif d'ordre 1, Canadian Journal of Statistics 24 (1996) 467-487. 

[3] B. Pumo, Prediction of continuous time processes by c [0, l]-valued autoregressive process, 
Statistical Inference for Stochastic Processes 1 (1998) 297-309. 

[4] A. Antoniadis, T. Sapatinas, Wavelet methods for continuous-time prediction using Hilbert- 
valued autoregressive processes, Journal of Multivariate Analysis 87 (2003) 133-158. 

[5] V. Kargin, A. Onatski, Curve forecasting by functional autoregression, Journal of Multivariate 
Analysis 99 (2008) 2508-2526. 

[6] A. Mas, B. Pumo, Linear processes for functional data, in: F. Ferraty, Y. Romain (Eds.), The 
Oxford Handbook of Functional Data Analysis, Oxford Handbooks in Mathematics, Oxford 
University Press, 2011, pp. 47-71. 

[7] B. Pumo, Estimation et prevision de processus autoregressifs fonctionnels, Ph.D. thesis, Uni- 
versity of Paris 6, 1992. 

[8] A. Mas, B. Pumo, The ARHD process, J. of Statistical Planning and Inference 137 (2007) 
538-553. 

[9] J. Damon, S. Guillas, The inclusion of exogenous variables in functional autoregressive ozone 
forecasting, Environmetrics 13 (2002) 759-774. 

[10] S. Guillas, Doubly stochastic Hilbertian processes, Journal of Applied Probability 39 (2002) 
566-580. 

[11] T. Kato, Perturbation Theory for Linear Operators, Springer- Verlag, Berlin, 1976. 

[12] A. Mas, Estimation d'operateurs de correlation de processus fonctionnels: lois limites, tests, 
deviations moderees, Ph.D. thesis, Universite Paris 6, 2000. 



18 



[13] D. Bosq, D. Blanke, Inference and Prediction in Large Dimensions, Wiley series in probability 
and statistics, John Wiley & Sons, Ltd., 2007. 

[14] S. Dabo-Niang, N. Rhomari, Kernel regression estimation in a banach space, Journal of 
Statistical Planning and Inference 139 (2009) 1421-1434. 

[15] D. Bosq, Linear processes in function spaces: Theory and applications, Springer- Verlag, New 
York, 2000. 

[16] A. Mas, L. Menneteau, Perturbation approach applied to the asymptotic study of random 
operators., Progress in Probability 55 (2003) 127-133. 

[17] A. Mas, Weak convergence in the functional autoregressive model, Journal of Multivariate 
Analysis 98 (2007) 1231-1261. 

[18] A. Antoniadis, X. Brossat, J. Cugliari, J.-M. Poggi, Clustering functional data with wavelets, 
International Journal of Wavelets, Multiresolution and Information Processing accepted for 
publication (2013). 



19 



